IN THE CLAIMS: 

Please amend Claims I and 24-29 as listed below. This listing of claims will replace all 
prior versions, and listings, of claims in the application. 

1 . (Currently Amended) A computer implemented method comprising: 

responsive to a first single instruction, identifying a first operand including four 
packed data elements, (13, r>, ri and roX and identifying a second operand including four 
packed coefficients, (w3, W2, wj. and wo), generating four packed first products, (13 W3, 
r2W2, r] wi and rowo) and storing said four packed first products at a first destination 
identified by said first single instruction: 

responsive to a second single instruction, identifying a third operand including 
four packed data elements, (S3, S2, si and so), and identifying a fourth operand including 
four packed coefficients, (w% w& w$ and w4), generating tour packed second products, 
($3w7, S2W6, siws and SOW4) and storing said four packed second products at a second 
destination identified by said second single instruction; and 

responsive to a third single instruction, identifying a fifth operand including the 
four packed first products and identifying a sixth operand including the four packed 
second products, generating four packed sums, (»2^6+ s 3 w 7, S0w4-fsjw5 i r2 w2+r3W3, 
and rowo+fj w\) and storing them at a third destination identified by said third single 
instruction. 

2. (Original) The method of claim 1 further comprising: 

responsive to the first single instruction, overwriting said four packed data 
elements in the first operand with said tour packed first products; 

responsive to the second single instruction, overwriting said four packed data 
elements in die second operand with said four packed second products; and 

responsive to the third single instruction, overwriting said four packed first 
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products in the fifth operand with said four packed sums. 

3. (Original ) The method of claim 1 further comprising. 

responsive to a fourth single instruction identifying a seventh operand including 
the four packed first products and identifying an eighth operand including the four 
packed second products, generating four packed differences, (s2w$-S3W7, s()W4-$Iw5 t 
r2W2-r$w3, and rowo-nw|) and storing them at a fourth destination identified by said 
fourth single instruction. 

4. (Original) The method of claim 3, said fourth destination storing elements to 
represent saturated packed differences, (S2w6-s3w7, $ow4-s] w5, r2w2-t3w3, and 
rowo-riwi), 

5. (Original ) The method of claim 1, said third destination storing elements to represent 
horizontal addition operations in a register specified by bits three through five of the third 
single instruction. 

6. (Original) The method of claim 5, said third destination storing elements to represent 
saturated arithmetic sums, (S2W6+S3W7, sow4+s|W5, r2 W2+I3W3, and rowo+riwi). 

7. (Original) The method of claim 5, said third destination comprising packed 16-bit 
elements to represent said four packed sums, (S2W6+S3W7, SOW4+SIW5, f2 W2-K3W3, and 
rowo+rlwi). 

8 (Original) The method of claim 5, said third destination comprising packed 32-bit 
elements to represent said four packed sums, (s2w6-f$3w7, s0w4fs|w5, r2 \v24-r3w3, and 
rowo+riwi). 
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9. (Original) The method of claim 8, said third destination storing elements to represent 
horizontal floating-point arithmetic operations. 

10. (Original) A processor comprising: 

a storage area to store a first packed data operand, a second packed data operand 
and a third packed data operand; and 

an execution unit coupled to said storage area, the execution unit to execute a first 
single instruction on data elements in said first packed data operand and said second 
packed data operand to generate a plurality of data elements in a first packed data result, 
at l east one of said plurality of data elements in said first packed data result being the 
result of an intra-add operation performed on a first pair of data elements of said first 
packed data operand and at least one other of said plurality of data elements in sai d first 
packed data result being the result of an intra-add operation performed on a second pair 
of data elements of said second packed data operand, the execution unit to execute a 
second single instruction on data elements in said third packed data operand and said 
second packed data operand to generate a plurality of data elements in a second packed 
data result, at least one of said plurality of data elements in said second packed data result 
being the resul t of an intra-subtract operation performed on a third pair of data elements 
of said third packed data operand and at least one other of said plurality of data elements 
in said second packed data result being the result of an intra-subtract operation performed 
on the second pair of data elements of said second packed data operand. 

1 1 . (Original) The processor of claim 10, wherein each of said plurality of data elements 
in said first packed data result being the result of an intra-add operation with signed 
saturation. 
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12. (Original) The processor of claim 11, wherein each of said plurality of data elements 
in said second packed data result being the result of an intra-suhtract operation with 
signed saturation. 

13. (Original) The processor of claim 10, the execution unit, in response to said first 
single instruction, overwriting said first packed data operand with said first packed data 
result, 

14. (Original) A apparatus comprising: 

a first storage area for storing a first packed data operand, containing at least an A 
data element and a B data element packed together; 

a second storage area for storing a second packed data operand containing at least 
a C data element and a D data element packed together; and 

an arithmetic circuit responsive to execution of a first single instruction to add the 
A data element and the B data element to generate a first result element of a third packed 
data, and to add the C data element and the D data element to generate a second result 
element of the third packed data, the arithmetic circuit responsive to execution of a 
second single instruction to subtract the A data element and the B data element to 
generate a third result element of a fourth packed data, and to subtract the C data element 
and the D data element to generate a fourth result element of the fourth packed data. 

.15. (Original) The apparatus of claim 14 wherein each of said plurality of data elements 
in said third packed data and said fourth packed data are the result of an mtra-add 
operation or an intra-subtract operation with signed saturation. 
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16. (Original) The apparatus of claim 14 further comprising: 

a decoder to decode said first single instruction and said second single instruction 
and to enable execution of said first single instruction and said second single instruction; 
and 

a register file comprising said first storage area and said second storage area, to 
provide the A data element, the B data element the C data element and the D data 
element responsive to the execution of said first or second single instruction, 

17, (Original) A system comprising. 

a first storage, area for storing a first packed data operand, containing at least an A 
data element and a B data element packed together; 

a second storage area for storing a second packed data operand containing at least 
a C data element and a D data element packed together: 

a third storage area for storing a third packed data operand containing at least a E 
data element and a F data element packed together; 

a decoder to decode a first single instruction and to enable execution of said first 
single instruction, the decoder to decode a second single instruction and to enable 
execution of said second single instruction; 

an arithmetic circuit responsive to enabling execution of said first single 
instruction to add the A data element and the B data element to generate a first result 
element of a fourth packed data, and to add the C data element and the D data element to 
generate a second result element of the fourth packed data, the arithmetic circuit 
responsive to enabling execution of said second single instruction to subtract the E data 
element and the F data element to generate a third result element of a fifth packed data, 
and to subtract the C data element and the D data element to generate a fourth result 
element of the fifth packed data; 

a wireless communication device to send and receive digital data over a wireless 
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network; 

a memory to store digital data and software including the first and second single 
instructions and to supply the first and second single instructions to said decoder; and 

an input output system responsive to said software to interface with the wireless 
communication device receiving data to process or sending data processed at least in part 
by said first and second single instructions. 

18. (Original ) The system of claim 17, wherein each of said first and second result 
elements of the fourth packed data are the result of an intra-add operation with signed 
saturation. 

19. (Original) The system of claim IS, wherein each of said third and fourth result 
elements of the fifth packed data are the result of an intra-sub tract operation with signed 
saturation, 

20. (Original) The system of claim 17, wherein each of said first and second result 
elements of the fourth packed data and said third and fourth result elements of the fifth 
packed are 16-bit results. 

21. (Original) The system of claim 17, wherein each of said first and second result 
elements of the fourth packed data and said third and fourth result elements of the fifth 
packed are 32-bit results. 

22. (Original) The system of claim 21, wherein each of said first and second result 
elements of the fourth packed data and said third and fourth result elements of the fifth 
packed data are floating point results. 
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23. (Original) The system of claim 21, wherein each of said first and second result 
elements of the fourth packed data and said third mid fourth result elements of the fifth 
packed data are unsaturated signed integer results. 

24. (C urrently Amended) An article of manufacture e©H*jju$er-seftwafe-pr<Mteet- 
including one or more recordable media having executable instructions stored thereon 
including a first instruction and a second instruction which, when executed by a 
processing device, cause the processing device to: 

access a first packed data operand, containing at least an A data element and a B 
data element packed together at a first storage area; 

access a second packed data operand containing at least a C data element and a D 
data element packed together at a second storage area; 

add the A data element and the B data element to generate a first result element of 
a third packed data in response to said first instruction; 

add the C data element and the D data element to generate a second result element 
of the third packed data in response to said ftrst instruction; 

access a fourth storage area for storing a fourth packed data operand containing at 
least a E data element and a F data element packed together, 

access the second packed data operand containing at least the C data element and 
the D data element packed together at the second storage area, 

subtract the E data element and the F data element to generate a third result 
element of a fifth packed data in response to said second instruction; and 

subtract the C data element and the D data element to generate a fourth result 
element of the fifth packed data in response to said first instruction. 
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25. (Currently Amended) The article. of manufacture eoR>put-er--sefl-ware-|>Feduet of 
claim 24 which, when executed by a processing device, further cause the processing 
device to: 

overwrite said first packed data operand with said third packed data in response to 
said first instruction. 

26. (Currently Amended) The article of manufacture CGnvpnt-er-softrware-produe-t of 
claim 24 which, when executed by a processing device, further cause the processing 
devi ce to: 

overwrite said fourth packed data operand with said fifth packed data in response 
to said second instruction. 

27. (Currently Amended) The article of manufacture system of claim 24, wherein 
each of said first and second result elements of the third packed data and said thi rd and 
fourth result el ements of the fifth packed data are saturated signed values. 

28. (Currenth Amended) PI tj inufactu.! 4ef» of claim 24, wherein 
each of said first and second result elements of the third packed data and said thi rd and 
fourth result elements of the fifth packed data are 16-bit integer values. 

29. (Currently Amended) The art! cl e of m ami focture system- of claim 24, wherein 
each of said first and second result elements of the third packed data and said third and 
fourth result elements of the fifth packed data are 32-bit integer values. 
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